Goto

Collaborating Authors

 Data Mining


Learning Mixture of Gaussians with Streaming Data

Neural Information Processing Systems

In this paper, we study the problem of learning a mixture of Gaussians with streaming data: given a stream of $N$ points in $d$ dimensions generated by an unknown mixture of $k$ spherical Gaussians, the goal is to estimate the model parameters using a single pass over the data stream. We analyze a streaming version of the popular Lloyd's heuristic and show that the algorithm estimates all the unknown centers of the component Gaussians accurately if they are sufficiently separated. Assuming each pair of centers are $C\sigma$ distant with $C=\Omega((k\log k)^{1/4}\sigma)$ and where $\sigma^2$ is the maximum variance of any Gaussian component, we show that asymptotically the algorithm estimates the centers optimally (up to certain constants); our center separation requirement matches the best known result for spherical Gaussians \citep{vempalawang}. For finite samples, we show that a bias term based on the initial estimate decreases at $O(1/{\rm poly}(N))$ rate while variance decreases at nearly optimal rate of $\sigma^2 d/N$. Our analysis requires seeding the algorithm with a good initial estimate of the true cluster centers for which we provide an online PCA based clustering algorithm. Indeed, the asymptotic per-step time complexity of our algorithm is the optimal $d\cdot k$ while space complexity of our algorithm is $O(dk\log k)$. In addition to the bias and variance terms which tend to $0$, the hard-thresholding based updates of streaming Lloyd's algorithm is agnostic to the data distribution and hence incurs an \emph{approximation error} that cannot be avoided. However, by using a streaming version of the classical \emph{(soft-thresholding-based)} EM method that exploits the Gaussian distribution explicitly, we show that for a mixture of two Gaussians the true means can be estimated consistently, with estimation error decreasing at nearly optimal rate, and tending to $0$ for $N\rightarrow \infty$.


Deep Sets

Neural Information Processing Systems

We study the problem of designing models for machine learning tasks defined on sets. In contrast to the traditional approach of operating on fixed dimensional vectors, we consider objective functions defined on sets and are invariant to permutations. Such problems are widespread, ranging from the estimation of population statistics, to anomaly detection in piezometer data of embankment dams, to cosmology. Our main theorem characterizes the permutation invariant objective functions and provides a family of functions to which any permutation invariant objective function must belong. This family of functions has a special structure which enables us to design a deep network architecture that can operate on sets and which can be deployed on a variety of scenarios including both unsupervised and supervised learning tasks. We demonstrate the applicability of our method on population statistic estimation, point cloud classification, set expansion, and outlier detection.


PRUNE: Preserving Proximity and Global Ranking for Network Embedding

Neural Information Processing Systems

We investigate an unsupervised generative approach for network embedding. A multi-task Siamese neural network structure is formulated to connect embedding vectors and our objective to preserve the global node ranking and local proximity of nodes. We provide deeper analysis to connect the proposed proximity objective to link prediction and community detection in the network. We show our model can satisfy the following design properties: scalability, asymmetry, unity and simplicity. Experiment results not only verify the above design properties but also demonstrate the superior performance in learning-to-rank, classification, regression, and link prediction tasks.


Multi-Armed Bandits with Metric Movement Costs

Neural Information Processing Systems

We consider the non-stochastic Multi-Armed Bandit problem in a setting where there is a fixed and known metric on the action space that determines a cost for switching between any pair of actions. The loss of the online learner has two components: the first is the usual loss of the selected actions, and the second is an additional loss due to switching between actions. Our main contribution gives a tight characterization of the expected minimax regret in this setting, in terms of a complexity measure $\mathcal{C}$ of the underlying metric which depends on its covering numbers. In finite metric spaces with $k$ actions, we give an efficient algorithm that achieves regret of the form $\widetilde(\max\set{\mathcal{C}^{1/3}T^{2/3},\sqrt{kT}})$, and show that this is the best possible. Our regret bound generalizes previous known regret bounds for some special cases: (i) the unit-switching cost regret $\widetilde{\Theta}(\max\set{k^{1/3}T^{2/3},\sqrt{kT}})$ where $\mathcal{C}=\Theta(k)$, and (ii) the interval metric with regret $\widetilde{\Theta}(\max\set{T^{2/3},\sqrt{kT}})$ where $\mathcal{C}=\Theta(1)$. For infinite metrics spaces with Lipschitz loss functions, we derive a tight regret bound of $\widetilde{\Theta}(T^{\frac{d+1}{d+2}})$ where $d \ge 1$ is the Minkowski dimension of the space, which is known to be tight even when there are no switching costs.


Multi-Task Learning for Contextual Bandits

Neural Information Processing Systems

Contextual bandits are a form of multi-armed bandit in which the agent has access to predictive side information (known as the context) for each arm at each time step, and have been used to model personalized news recommendation, ad placement, and other applications. In this work, we propose a multi-task learning framework for contextual bandit problems. Like multi-task learning in the batch setting, the goal is to leverage similarities in contexts for different arms so as to improve the agent's ability to predict rewards from contexts. We propose an upper confidence bound-based multi-task learning algorithm for contextual bandits, establish a corresponding regret bound, and interpret this bound to quantify the advantages of learning in the presence of high task (arm) similarity. We also describe an effective scheme for estimating task similarity from data, and demonstrate our algorithm's performance on several data sets.


Sparse Embedded k -Means Clustering

Neural Information Processing Systems

The $k$-means clustering algorithm is a ubiquitous tool in data mining and machine learning that shows promising performance. However, its high computational cost has hindered its applications in broad domains. Researchers have successfully addressed these obstacles with dimensionality reduction methods. Recently, [1] develop a state-of-the-art random projection (RP) method for faster $k$-means clustering. Their method delivers many improvements over other dimensionality reduction methods.


EX2: Exploration with Exemplar Models for Deep Reinforcement Learning

Neural Information Processing Systems

Deep reinforcement learning algorithms have been shown to learn complex tasks using highly general policy classes. However, sparse reward problems remain a significant challenge. Exploration methods based on novelty detection have been particularly successful in such settings but typically require generative or predictive models of the observations, which can be difficult to train when the observations are very high-dimensional and complex, as in the case of raw images. We propose a novelty detection algorithm for exploration that is based entirely on discriminatively trained exemplar models, where classifiers are trained to discriminate each visited state against all others. Intuitively, novel states are easier to distinguish against other states seen during training. We show that this kind of discriminative modeling corresponds to implicit density estimation, and that it can be combined with count-based exploration to produce competitive results on a range of popular benchmark tasks, including state-of-the-art results on challenging egocentric observations in the vizDoom benchmark.


On clustering network-valued data

Neural Information Processing Systems

Community detection, which focuses on clustering nodes or detecting communities in (mostly) a single network, is a problem of considerable practical interest and has received a great deal of attention in the research community. While being able to cluster within a network is important, there are emerging needs to be able to \emph{cluster multiple networks}. This is largely motivated by the routine collection of network data that are generated from potentially different populations. These networks may or may not have node correspondence. When node correspondence is present, we cluster networks by summarizing a network by its graphon estimate, whereas when node correspondence is not present, we propose a novel solution for clustering such networks by associating a computationally feasible feature vector to each network based on trace of powers of the adjacency matrix. We illustrate our methods using both simulated and real data sets, and theoretical justifications are provided in terms of consistency.


Dueling Bandits: Beyond Condorcet Winners to General Tournament Solutions

Neural Information Processing Systems

Recent work on deriving $O(\log T)$ anytime regret bounds for stochastic dueling bandit problems has considered mostly Condorcet winners, which do not always exist, and more recently, winners defined by the Copeland set, which do always exist. In this work, we consider a broad notion of winners defined by tournament solutions in social choice theory, which include the Copeland set as a special case but also include several other notions of winners such as the top cycle, uncovered set, and Banks set, and which, like the Copeland set, always exist. We develop a family of UCB-style dueling bandit algorithms for such general tournament solutions, and show $O(\log T)$ anytime regret bounds for them. Experiments confirm the ability of our algorithms to achieve low regret relative to the target winning set of interest.


Community Detection on Evolving Graphs

Neural Information Processing Systems

Clustering is a fundamental step in many information-retrieval and data-mining applications. Detecting clusters in graphs is also a key tool for finding the community structure in social and behavioral networks. In many of these applications, the input graph evolves over time in a continual and decentralized manner, and, to maintain a good clustering, the clustering algorithm needs to repeatedly probe the graph. Furthermore, there are often limitations on the frequency of such probes, either imposed explicitly by the online platform (e.g., in the case of crawling proprietary social networks like twitter) or implicitly because of resource limitations (e.g., in the case of crawling the web). In this paper, we study a model of clustering on evolving graphs that captures this aspect of the problem. Our model is based on the classical stochastic block model, which has been used to assess rigorously the quality of various static clustering methods. In our model, the algorithm is supposed to reconstruct the planted clustering, given the ability to query for small pieces of local information about the graph, at a limited rate. We design and analyze clustering algorithms that work in this model, and show asymptotically tight upper and lower bounds on their accuracy. Finally, we perform simulations, which demonstrate that our main asymptotic results hold true also in practice.